An overview of the BioCreative 2012 Workshop Track III: interactive text mining task

نویسندگان

  • Cecilia N. Arighi
  • Ben Carterette
  • K. Bretonnel Cohen
  • Martin Krallinger
  • W. John Wilbur
  • Petra Fey
  • Robert Dodson
  • Laurel Cooper
  • Ceri E. Van Slyke
  • Wasila M. Dahdul
  • Paula M. Mabee
  • Donghui Li
  • Bethany R. Harris
  • Marc Gillespie
  • Silvia Jimenez
  • Phoebe M. Roberts
  • Lisa Matthews
  • Kevin Becker
  • Harold J. Drabkin
  • Susan M. Bello
  • Luana Licata
  • Andrew Chatr-aryamontri
  • Mary L. Schaeffer
  • Julie Park
  • Melissa Haendel
  • Kimberly Van Auken
  • Yuling Li
  • Juancarlos Chan
  • Hans-Michael Müller
  • Hong Cui
  • James P. Balhoff
  • Johnny Chi-Yang Wu
  • Zhiyong Lu
  • Chih-Hsuan Wei
  • Catalina O. Tudor
  • Kalpana Raja
  • Suresh Subramani
  • Jeyakumar Natarajan
  • Juan Miguel Cejuela
  • Pratibha Dubey
  • Cathy H. Wu
چکیده

In many databases, biocuration primarily involves literature curation, which usually involves retrieving relevant articles, extracting information that will translate into annotations and identifying new incoming literature. As the volume of biological literature increases, the use of text mining to assist in biocuration becomes increasingly relevant. A number of groups have developed tools for text mining from a computer science/linguistics perspective, and there are many initiatives to curate some aspect of biology from the literature. Some biocuration efforts already make use of a text mining tool, but there have not been many broad-based systematic efforts to study which aspects of a text mining tool contribute to its usefulness for a curation task. Here, we report on an effort to bring together text mining tool developers and database biocurators to test the utility and usability of tools. Six text mining systems presenting diverse biocuration tasks participated in a formal evaluation, and appropriate biocurators were recruited for testing. The performance results from this evaluation indicate that some of the systems were able to improve efficiency of curation by speeding up the curation task significantly (∼1.7- to 2.5-fold) over manual curation. In addition, some of the systems were able to improve annotation accuracy when compared with the performance on the manually curated set. In terms of inter-annotator agreement, the factors that contributed to significant differences for some of the systems included the expertise of the biocurator on the given curation task, the inherent difficulty of the curation and attention to annotation guidelines. After the task, annotators were asked to complete a survey to help identify strengths and weaknesses of the various systems. The analysis of this survey highlights how important task completion is to the biocurators' overall experience of a system, regardless of the system's high score on design, learnability and usability. In addition, strategies to refine the annotation guidelines and systems documentation, to adapt the tools to the needs and query types the end user might have and to evaluate performance in terms of efficiency, user interface, result export and traditional evaluation metrics have been analyzed during this task. This analysis will help to plan for a more intense study in BioCreative IV.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ii Track 3 An Overview of the BioCreative Workshop 2012 Track III : Interactive

The BioCreAtIvE (Critical Assessment of Information Extraction systems in Biology) challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems for the biological domain. The BioCreative Workshop 2012 subcommittee identified three areas, or tracks, that comprised independent, but complementary aspects of data curation in which they sought communi...

متن کامل

Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II

Manual curation of data from the biomedical literature is a rate-limiting factor for many expert curated databases. Despite the continuing advances in biomedical text mining and the pressing needs of biocurators for better tools, few existing text-mining tools have been successfully integrated into production literature curation systems such as those used by the expert curated databases. To clo...

متن کامل

Collaborative biocuration—text-mining development task for document prioritization for curation

The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems for the biological domain. The 'BioCreative Workshop 2012' subcommittee identified three areas, or tracks, that comprised independent, but complementary aspects of data curation in which they sought commu...

متن کامل

Text mining in the biocuration workflow: applications for literature curation at WormBase, dictyBase and TAIR

WormBase, dictyBase and The Arabidopsis Information Resource (TAIR) are model organism databases containing information about Caenorhabditis elegans and other nematodes, the social amoeba Dictyostelium discoideum and related Dictyostelids and the flowering plant Arabidopsis thaliana, respectively. Each database curates multiple data types from the primary research literature. In this article, w...

متن کامل

Accelerating literature curation with text-mining tools: a case study of using PubTator to curate genes in PubMed abstracts

Today's biomedical research has become heavily dependent on access to the biological knowledge encoded in expert curated biological databases. As the volume of biological literature grows rapidly, it becomes increasingly difficult for biocurators to keep up with the literature because manual curation is an expensive and time-consuming endeavour. Past research has suggested that computer-assiste...

متن کامل

BioCreative IV Interactive Task

Fully automated text mining systems promote efficient literature searching, retrieval, and review but are not sufficient to produce ready-to-consume curated documents. These systems are not meant to replace curators, but they can assist in one or more biocuration steps. To do so, the interface with the curator is an important aspect that needs to be considered for tool adoption. The BioCreative...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 2013  شماره 

صفحات  -

تاریخ انتشار 2013